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Abstract 

In this paper we consider the time complexity of adding two n-bit numbers together within the tile self- 
assembly model. The (abstract) tile assembly model is a mathematical model of self-assembly in which 
system components are square tiles with different glue types assigned to tile edges. Assembly is driven 
by the attachment of singleton tiles to a growing seed assembly when the net force of glue attraction for a 
tile exceeds some fixed threshold. Within this frame work, we examine the time complexity of computing 
the sum of 2 n-bit numbers, where the input numbers are encoded in an initial seed assembly, and the 
output sum is encoded in the final, terminal assembly of the system. We show that this problem, along 
with multiplication, has a worst case lower bound of Q(y/n) in 2D assembly, and Q(tyn) in 3D assembly. 
We further design algorithms for both 2D and 3D that meet this bound with worst case run times of 
0(y/n) and 0(^/n) respectively, which beats the previous best known upper bound of O(n). Finally, we 
consider average case complexity of addition over uniformly distributed n-bit strings and show how to 
achieve O(logn) average case time with a simultaneous 0(y / n) worst case run time in 2D. As additional 
evidence for the speed of our algorithms, we implement our algorithms, along with the simpler 0(n) time 
algorithm, into a probabilistic run-time simulator and compare the timing results. 



1 Introduction. 

Self-assembly is the process by which systems of simple objects autonomously organize themselves through 
local interactions into larger, more complex objects. Self-assembly processes are abundant in nature and 
serve as the basis for biological growth and replication. Understanding how to design and efficiently program 
molecular self-assembly systems promises to be fundamental for the future of nanotechnology. One particular 
direction of interest is the design of molecular computing systems for the efficient solution of fundamental 
computational problems. In this paper we study the complexity of computing arithmetic primitives within 
a well studied model of algorithmic self-assembly, the abstract tile assembly model. 

The abstract tile assembly model (aTAM) models system monomers with four sided Wang tiles with glue 
types assigned to each edge. Assembly proceeds by tiles attaching, one by one, to a growing initial seed 
assembly whenever the net glue strength of attachment exceeds some fixed temperature threshold. The aTAM 



has been shown to be capable of universal computation 15 , and research leveraging this computational 
power has lead to efficient assembly of complex geometric shapes and patterns with a number of recent 
results in FOCS, SODA, and ICALP p^[5]- [T2] , [T4] . This universality also allows the model to serve directly 
as a model for computation in which an input bit string is encoded into an initial assembly. The process of 
self-assembly and the final produced terminal assembly represent the computation of a function on the given 
input. Given this framework, it is natural to ask how fast a given function can be computed in this model. 
Tile assembly systems can be designed to take advantage of massive parallelism when multiple tiles attach at 
distinct positions in parallel, opening the possibility for faster algorithms than what can be achieved in more 
traditional computational models. On the other hand, tile assembly algorithms must use up geometric space 
to perform computation, and must pay substantial time costs when communicating information between to 
physically distant bits. This creates a host of challenges unique to this physically motivated computational 
model that warrant careful study. 
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(a) Incorrect binding. (b) Correct binding. 

Figure 1: Cooperative tile binding in the aTAM. 



In this paper we consider the time complexity of adding two n-bit numbers within the abstract tile 
assembly model. We show that this problem, along with multiplication, has a worst-case lower bound of 
fl(^/n) time in 2D and fl(-^n) time in 3D. These lower bounds are derived by a reduction from a simple 
problem we term the communication problem in which two distant bits must compute the AND function 
between themselves. This general reduction technique can likely be applied to a number of problems and 
yields key insights into how one might design a sub-linear time solution to such problems. We in turn 
show how these lower bounds, in the case of 2D and 3D addition, are matched by corresponding worst case 
0(\/n) and 0{^/n) run time algorithms, respectively, which improves upon the previous best known result 
of 0(n) [i]. We then consider the average case complexity of addition given two uniformly generated random 
n-bit numbers and construct a O(logn) average case time algorithm that achieves simultaneous worst case 
run time 0(y/n) in 2D. To the best of our knowledge this is the first tile assembly algorithm proposed for 
efficient average case adding. Our results are summarized in Table [I] In addition to our analytical results, 
tile self-assembly software simulations were conducted to visualize the diverse approaches to fast arithmetic 
presented in this paper, as well as to compare them to previous work. The adder tile constructions described 
in Sections [4j [5] and [6j and the previous best known algorithm from [4j were simulated using the two timing 
models described in Sections |2.3| and |2.3| These results can be seen in the graphs in Section [7] 



2 Definitions 

2.1 Basic Notation. 

Let N n denote the set {1, . . . , n} and let Z„ denote the set {0, . . . , n — 1}. Consider two points p,q £ Z d , 
p = (pi, . . .p d ), q = (qi, ■ ■ -,qd)- Define A p . q = maxi< i < d {|p. i - q t \}. 

2.2 Abstract Tile Assembly Model. 

Tiles. Consider some alphabet of symbols II called the glue types. A tile is a finite edge polygon (polyhedron 
in the case of a 3D generalization) with some finite subset of border points each assigned some glue type 
from II. Further, each glue type g G II has some non-negative integer strength str(g). For each tile t we also 
associate a finite string label (typically "0", or "1", or the empty label in this paper), denoted by label(i), 
which allows the classification of tiles by their labels. In this paper we consider a special class of tiles that 
are unit squares (or unit cubes in 3D) of the same orientation with at most one glue type per face, with each 
glue being placed exactly in the center of the tile's face. We denote the location of a tile to be the point at 
the center of the square or cube tile. In this paper we focus on tiles at integer locations. 
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Assemblies. An assembly is a finite set of tiles whose interiors do not overlap. Further, to simplify 
formalization in this paper, we further require the center of each tile in an assembly to be an integer 
coordinate (or integer triplet in 3D). If each tile in A is a translation of some tile in a set of tiles T, we say 
that A is an assembly over tile set T. For a given assembly T, define the bond graph Gx to be the weighted 
graph in which each element of T is a vertex, and the weight of an edge between two tiles is the strength 
of the overlapping matching glue points between the two tiles. Note that only overlapping glues that are 
the same type contribute a non-zero weight, whereas overlapping, non-equal glues always contribute zero 
weight to the bond graph. The property that only equal glue types interact with each other is referred to 
as the diagonal glue function property and is perhaps more feasible than more general glue functions for 
experimental implementation. An assembly T is said to be r-stable for an integer r if the min-cut of Gx is 
at least r. 



Tile Attachment. Given a tile t, an integer r, and a r-stable assembly A, we say that t may attach to A 
at temperature r to form A' if there exists a translation t' of t such that A' = A[j{t'}, and A' is r-stable. 
For a tile set T we use notation A — >t,t A' to denote that there exists some t 6 T that may attach to A to 
form A' at temperature r. When T and r are implied, we simply say A — » A' . Further, we say that A — >* A' 
if there exists a finite sequence of assemblies {A\ . . . Af.) such that A —> Ax — >• . . . — s- Af. —> A' . 



Tile Systems. A tile system T — (T, S, r) is an ordered triplet consisting of a set of tiles T referred to 
as the system's tile set, a r-stable assembly S referred to as the system's seed assembly, and a positive 
integer r referred to as the system's temperature. A tile system T = (T, S, r) has an associated set of 
producible assemblies, PRODr, which define what assemblies can grow from the initial seed S by any sequence 
of temperature r tile attachments from T. Formally, S € PRODp as a base case producible assembly. Further, 
for any A € PRODr, if A — >t,t A' , then A' E PRODr. That is, assembly S is producible, and for any producible 
assembly A, if A can grow into A', then A' is also producible. We further define the set of terminal assemblies 
TERMr to be the subset of PRODr containing all producible assemblies that have no attachable tile from T 
at temperature r. Conceptually, TERMr represents the final collection of output assemblies that are built 
from r given enough time for all assemblies to reach a final, terminal state. General tile systems may have 
terminal assembly sets containing 0, finite, or infinitely many distinct assemblies. Systems with exactly 
1 terminal assembly are said to be deterministic. For a deterministic tile system T, we say T uniquely 
assembles assembly A if TERMr = {^}- ln this paper, we focus exclusively on deterministic systems. For 



recent consideration of non-determinism in tile self-assembly see [5148 12 



2.3 Problem Description. 

We now formalize what we mean for a tile self-assembly system to compute a function. To do this we present 
the concept of a tile assembly computer (TAC) which consists of a tile set and temperature parameter, along 
with input and output templates. The input template serves as a seed structure with a sequence of wildcard 
positions for which tiles of label "0" and "1" may be placed to construct an initial seed assembly. An output 
template is a sequence of points denoting locations for which the TAC, when grown from a filled in template, 
will place tiles with "0" and "1" labels that denote the output bit string. A TAC then is said to compute a 
function / if for any seed assembly derived by plugging in a bitstring b, the terminal assembly of the system 
with tile set T and temperature r will be such that the value of f(b) is encoded in the sequence of tiles placed 
according to the locations of the output template. We now develop the formal definition of the TAC concept. 
We note that the formality in the input template is of substantial importance. Simpler definitions which 
map seeds to input bit strings, and terminal assemblies to output bitstrings, arc problematic in that they 
allow for the possibility of encoding the computation of function / in the seed structure. Even something 
as innocuous sounding as allowing more than a single type of "0" or "1" tile as an input bit has the subtle 
issue of allowing pre-computing of 

1 This subtle issue seems to exist with some previous formulations of tile assembly computation. 
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Input Template. Consider a tile set T containing exactly one tile to with label "0" , and one tile t\ with 
label "1". An n-bit input template over tile set T is an ordered pair U — (R, B(i)), where R is an assembly 
over T — {to,ti}, B : N„ — > 1? , and B(i) is not the position of any tile in R for any i from 1 to n. The 
sequence of n coordinates denoted by B conceptually denotes "wildcard" tile positions for which copies of t 
and t\ will be filled in for any instance of the template. For notation we define assembly Ub over T, for bit 
string b = bx, . . . b n , to be the assembly consisting of assembly R unioned with a set of n tiles t % for i from 
1 to n, where t % is equal a translation of tile tba) to position B(i). That is, Ub is the assembly R with each 
position B(i) tiled with either to or t% according to the value of &j. 

Output Template. A fc-bit output template is simply a sequence of k coordinates denoted by function 
C : N k — > Z 2 . For an output template V , an assembly A over T is said to represent binary string c = c±, . . . , Cf. 
over template V if the tile at position C{i) in A has label Ci for all i from 1 to k. Note that output template 
solutions are much looser than input templates in that there may be multiple tiles with labels "1" and "0" , 
and there are no restrictions on the assembly outside of the k specified wildcard positions. The strictness 
for the input template stems from the fact that the input must "look the same" in all ways except for the 
explicit input bit patterns. If this where not the case, it would likely be possible to encode the solution to 
the computational problem into the input template, resulting is a trivial solution. 

Function Computing Problem. A tile assembly computer (TAC) is an ordered quadruple 3 = (T, U, V, r) 
where T is a tile set, U is an n-bit input template, and V is a fc-bit output template. A TAC is said to 
compute function / : ZJ — » Zf if for any i) £ and c € Z§ such that f(b) — c, then the tile system 
rs,6 = (T, Ub, t) uniquely assembles an assembly A which represents c over template V. For a TAC 9 that 
computes the function / : Z|™ — > Z^ +1 where f{r\ . . . r 2n ) = r\ . . . r n + r n+1 . . . r 2n , we say that 3 is an 
n-bit adder TAC with inputs a = r% . . . r n and b = r n+ \ . . . r-i n . An n-bit multiplier TAC is defined similarly. 

Run-time. The run time model we use in this paper was first proposed in [3j. For a deterministic tile 
system T = (T,S,t) and assembly A € PRDDr, the 1-step transition set of assemblies for A is defined to 
be STEP r , A = {B € PR0D r |A -^ T , T B}. For a given A e PRDDr, let PARALLEL^ = UBeSTEP r A B > ie > 
PARALLELr. a is the result of attaching all singleton tiles that can attach directly to A. Note that since T is 
deterministic, PARALLEL]^ is guaranteed to not contain overlapping tiles and is therefore an assembly. For 
an assembly A, we say A A' if A' = PARALLELr,^- We define the parallel run-time of a deterministic tile 
system V = (T, S, r) to be the non-negative integer k such that A\ =tr A2 =^r ■ • ■ =^r Ak where Ax = S and 
{A k } = TERMr . For any assemblies A and B in PR0D r such that Ax =4r A 2 =^r • ■ • =4r A k with A = Ax and 
B = A k , we say that A B. Alternately, we denote B with notation A =t£. For a TAC 3 = (T, U, V, r) 
that computes function /, the run time of 3 on input b is defined to be the parallel run-time of tile system 
rs,b = {T, Ub, t). Worst case and average case run time are then defined in terms of the largest run time 
inducing b and the average run time for a uniformly generated random b. 

Probabilistic Run-time. In addition to the simple parallel run time, we also introduce a probabilistic 
run time which is a discrete version of continuous time Markov process models originally proposed in [2] for 
the purposes of showing in simulation that our simple parallel model sufficiently captures the notion of run 
time when compared to a more realistic probabilistic model. Briefly, in a single time step the model attaches 
in parallel any number of tiles that may attach with the added restriction that each tile attaches with only 
probability j^y , modelling the concept of a run time in which a time step is normalized to the wait time for 
a single tile (not necessarily the correct tile) to bump into an attachment position. Analytical study of the 
relation between this type of run time and parallel run time is a direction for future work, although the two 
models should be within logarithmic factors of one another in expectation for constant size tile sets, which 
we utilize exclusively in this paper. 



4 



3 Lower Bound for Long Distance Communication 



In this section we formulate a class of problems we term the communication problems in which the goal is 
to compute a simple AND function on a 2-bit input given that the input template separates the 2 input bits 
some specified distance A. We formulate this problem for the purposes of providing lower bounds on the 
worst-case time complexity for this problem. We then reduce this problem to addition and multiplication 
problems in 2D and 3D to provide worst case lower bounds for addition and multiplication. 



3.1 High-Level Sketch of Lower Bound Proofs 

To prove lower bounds for addition and multiplication in 2D and 3D, we do the following. First, we consider 
two identical tile systems with the exception of their respective seed assemblies which differ in exactly one 
tile location. We show in Lemma [3. 1| that after A time steps, all positions more than A distance from the 
point of initial difference of the assemblies must be identical among the two systems. We then consider the 
communication problem in which we compute the AND function of two input bits under the assumption 
that the input template for the problem separates the two bits by distance A. For such a problem, we know 
that the output position of the solution bit must be at least distance | A from one of the two input bits. As 



the correct output for the AND function must be a function of both bits, Lemm a 3.1 implies that at least 
|A steps are required to guarantee a correct solution as argued in Theorem 3.2 

With the lower bound of |A established for the communication problem, we move on to the problems of 
addition and multiplication of n-bit numbers. We show how the communication problem can be reduced to 
these problems, thereby yielding corresponding lower bounds. In particular, consider the addition problem 
in 2D. As the input template must contain positions for 2n bits, in 2D it must be the case that some pair of 



bits are separated by at least J7(y / n) distance according to Lemma 3.3 Focussing on this pair of bit positions 
in the addition template, we can create a corresponding communication problem template with the same 
two positions as input. To guarantee the correct output, we hard code the remaining bit positions of the 
addition template such that the addition algorithm is guaranteed to place the AND of the desired bit pair in 
a specific position in the output template, thereby constituting a solution to the A = Q(y<n) communication 
problem, which implies the addition solution cannot finish faster than Q,(y/n) in the worst case. A similar 
reduction can be applied to multiplication. The precise reductions are detailed in Theorems |3.4| and |3.6| 



3.2 Communication Problem. 

The A-communication problem is the problem of computing the function f(b\, 62) = b\ A 62 for bits b\ and 
62 in the 3D aTAM under the additional constraint that the input template for the solution U = (R,B(i)) 
be such that A = mBx(\B(l) x - B(2) x \, \B(l) y - B(2) v \, \B(1), - B(2) z \). 

We first establish a lemma which intuitively states that for any 2 seed assemblies that differ in only a 
single tile position, all points of distance greater than r from the point of difference will be identically tiled 
(or empty) after r time steps of parallelized tile attachments: 

Lemma 3.1. Let S p t and S p t> denote two assemblies that are identical except for a single tile t versus 
if at position p — (p x ,Py,Pz) in each assembly. Further, let T = (T, S p j,t) and V — (T, S Pt t' , t) be two 
deterministic tile assembly systems such that S p .t =4fv R and S Pi t' R' for non-negative integer r. Then 
for any point q = (q x ,q y ,q z ) such that r < max(\p x - q x \, \p y - q y \, \p z - q z \), it must be that R q = R' q , ie, 
R and R' contain the same tile at point q. 

Proof. We show this by induction on r. As a base case of r = 0, we have that R = S p j and R' = S qt t>, and 
therefore R and R' are identical at any point outside of point p by the definition of S Ptt and S p _ t ' ■ 

Inductively, assume that for some integer k we have that for all points w such that k < A p . w = max(\p x — 
w x\t \Py — w y\i \Pz — Wz\), we have that = R k , where S P)t =4 r R k , and S Pt # =t r < R k '■ Now consider 
some point q such that k + 1 < A p q = max \p x — q x \, \p y — q y \, \p z — q z \, along with assemblies R k+1 and 
R k+1 where S p , t =^ fe+1 R k+1 , and S p>t > ^ fe+1 R k+1 . Consider the direct neighbors (6 of them in 3D) of 
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point q. For each neighbor point c, we know that A p c > k. Therefore, by inductive hypothesis, R k = R k 
where S Pi t =4p R k , and S Pi t' =4y R ' ■ Therefore, as attachment of a tile at a position is only dependent on 
the tiles in neighboring positions of the point, we know that tile R k+1 may attach to both R k and R k at 
position q, implying that R k+1 — R q k+1 as T and T' are deterministic. □ 

Theorem 3.2. Any solution to the A-communication problem has run time at least ^A. 

Proof. Consider a TAC 3 = (T, U = (R, B(i)),V(i)) that computes the A-communication problem. First, 
note that B has domain of 1 and 2, and V has domain of just 1 (the input is 2 bits, the output is 1 bit). 
We now consider the value Ay defined to be the largest distance between the output bit position of V from 
either of the two input bit positions in B: Let Ay = ma^A^ivym, Ab{2),v(i))- Without loss of generality, 
assume = A#(i),y(i)- Note that Ay > |A. 

Now consider the computation of /(0, 1) = versus the computation of /(l, 1) = 1 via our TAC 5. Let 
A denote the terminal assembly of system r = (T,Uq^,t) and let A 1 denote the terminal assembly of 
system T\ — (T, U± t i,T). As 3 computes /, we know that -^-vti) ^ ^wi)' Further, from Lemma |3.ll we 
know that for any r < Ay, we have that W^m = Wym f° r anv W° and W 1 such that Uo,i =4f W v and 

Ui.i =4fi W 1 . Let d a denote the run time of 3. Then we know that U ,i =^o an( ^ ^r? ^ by tlie 
definition of run time. If g?q, < Ay, then Lemma 3.1 implies that that Ay,^ = Ayi^y which contradicts the 
fact that 3 compute /. Therefore, the run time g?q is at least Ay > |A. □ 

3.3 Vt(y/n) Lower Bounds for Addition and Multiplication. 

We now show how to reduce instances of the communication problem to the arithmetic problems of addition 
and multiplication in 2D and 3D to obtain lower bounds of D,(y/n) and Q,{^fn) respectively. 

We first show the following Lemma which lower bounds the distance of the farthest pair of points in a 
set of n points. We believe this Lemma is likely well known, or is at least the corollary of an established 
result. We include it's proof for completeness. 

Lemma 3.3. For positive integers n and d, consider A c 1 d and B C Z d such that Af]B = and 
\A\ = \B\ = n. There must exist points p £ A and q € B such that A P:Q > \^\-\/2n~}~\ — 1. 

Proof. To sec this, consider a bounding box of all 2n points. If all d dimensions of the bounding box were of 
length strictly less that v^2n, then the box could not contain all 2n points. Therefore, at least one dimension 
is of length at least [v 7 ^"-!, implying that there are two points of distance at least [v 7 ^-! ~ 1 along that 
particular axis. If these two points are in A and B respectively, then the claim follows. If not, say both are 
from set A, then there must be a point in B that is at least [^[v / '2ri]] — 1 from one of these two points in 
A, implying the claim. □ 

Theorem 3.4. Any n-bit adder TAC that has a dimension d input template for d = 1, d = 2, or d = 3, has 
a worst case run time of Q ( tfn) . 

Proof. To show the lower bound, we will reduce the A-communication problem for some A = Q(tfri) to the 71- 
bit adder problem with a d-dimension template. Consider some n-bit adder TAC 3 = (T, U — (F, W) 1 V, r) 
such that U is a d-dimension template. The 2n sequence of wildcard positions W of this TAC must be 
contained in d-dimensional space by the definition of a d-dimension template, and therefore by Lemma |3.3| 
there must exist points W(i) for i < n, and W(n + j) for j < n, such that Ayya\ w / n+ j\ > \^\\Z2n\~\ — 1 = 
n(-y^n). Now consider two n-bit inputs a = a n . . . a\ and b = b n . . . b\ to the adder TAC 3 such that: = 
for any k > i and any k < j, and afc = 1 for any k such that j < k < i. Further, let bk — for all k ^ j. 
The remaining bits a, and dj are unassigned variables of value either or 1. Note that the i + 1 bit of a + b 
is 1 if and only if and bj are both value 1. This setup constitutes our reduction of the A-communication 
problem to the addition problem as the adder TAC template with the specified bits hardcoded in constitutes 
a template for the A-communication problem that produces the AND of the input bit pair. We now specify 
explicitly how to generate a communication TAC from a given adder TAC. 



G 



For given n-bit adder TAC 9? = (T,U = (F,W),V,t) with dimension d input template, we derive a 
A-communication TAC p = (T, U 2 = (F 2 , W ), V 2 , t) as follows. First, let W 2 {\) = W(i), and W 2 (2) = 
W(n + j ). Note that as ^w{i).w(n+j) — ^(v 7 ^-); W 2 satisfies the requirements for a A-communication input 
template for some A = ^l(-^n). Derive the frame of the template F 2 from F by adding tiles to F as follows: 
For any positive integer k > i, or k < j, or k > n but not k = n + j, add a translation of to (with label "0") 
translated to position W(k). Additionally, for any k such that j < k < i, add a translation of ii (with label 
"1") at translation W(k). 

Now consider the A-communication TAC p = {T,U 2 = (F 2 , W 2 ), V 2 , r) for some A = fl(^n). As 
assembly U 2 . b . = {7 01 ... an ,6 l ...6 n , we know that the worst case run time of p is at most that of the worst case 



Therefore, by Theorem 3.2 we have that 3 has a run time of at least fl(^/n). □ 



As the bound of dimension d on the input template of a TAC alone lower bounds the run time of the 
TAC, we get the following corollary. 

Corollary 3.5. Any <i-dimension n-bit adder TAC has worst case run-time fl(-^n). 

We now provide a lower bound for multiplication. 

Theorem 3.6. Any n-bit multiplier TAC that has a dimension d input template for d — 1, d = 2, or d = 3, 
has a worst case run time of Q ( \fn) . 

Proof. Cons ider some n-bit multiplier TAC 3 = (T, U = (F, W), V, r) with <i-dimension input template. By 
Lemma 



3.3 



some W(i) and W(n + j) must have distance at least A > [| [v 7 ^!! — 1- Now consider input 
strings a = a n ■ ■ ■ a>i and b = b n . . . bi to 3 such that <Zj and bj are of variable value, and all other a& and bk 
have value 0. For such input strings, the i + j bit of the product ab has value 1 if and only if a% = bj = 1. 
Thus, we can convert the n-bit multiplier system 9 into a A-communication TAC with the same worst case 



run time in the same fashion as for Theorem 3.4 yielding a fl(^/n) lower bound for the worst case run time 



of 3. □ 

As with addition, the lower bound implied by the limited dimension of the input template alone yields 
the general lower bound for d dimensional multiplication TACS. 

Corollary 3.7. Any e?-dimension n-bit multiplier TAC has worst case run-time Vt(yfn). 

4 Addition In Average Case Logarithmic Time 

a) MSB of A LSBofA 

MSB of B LSB of B 



b ) MSB iTTl M I I I I M M I I I I I I M TT71 M M m LSB 

Figure 2: Arrows represent carry origination and propagation direction, a) This schematic represents the 
previously described 0(n) worst case addition for addends A and B [I]. The least significant and most 
significant bits of A and B are denoted by LSB and MSB, respectively, b) The average case O(logn) 
construction described in this paper is shown here. Addends A and B populate the linear assembly with bit 
Ai immediatly adjacent to Bi. Carry propagation is done in parallel along the length of the assembly. 

We construct an adder TAC that resembles an electronic carry-skip adder in that the carry-out bit for 
addend pairs where each addend in the pair has the same bit value is generated in a constant number of steps 
and immediately propagated. When each addend in a pair of addends does not have the same bit value, a 
carry-out cannot be deduced until the value of the carry-in to the pair of addends is known. When such 
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addends combinations occur in a contiguous sequence, the carry must ripple through the sequence from right- 
to-left, one step at a time as each position is evaluated. Within these worst-case sequences, our construction 
resembles an electronic ripple-carry adder. We show that using this approach it is possible to construct an 
n-bit adder TAC that can perform addition with an average runtime of O(logn) and a worst-case runtime 
of 0(n). 

Lemma 4.1. Consider a non-negative integer N generated uniformly at random from the set {0, 1, 2 n — 1}. 
The expected length of the longest substring of contiguous ones in the binary expansion of N is O(logn). 



For a proof of Lemma 4.1 please see Schilling [13 



Theorem 4.2. For any positive integer n, there exists an n-bit adder TAC (tile assembly computer) that 
has worst case run time 0(n) and an average case run time of O(logn). 



The proof of Theorem |4.2| follows from the construction of the adder in Sections 4.1 4.2 and 4.3 



4.1 Construction 
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d) Carry Transfer 



e) Addition 



Figure 3: This is the complete set of tiles necessary to implement average case O(logn) and worst case 0(n) 
addition. 



We summarize the mechanism of addition presented here in a short example. The complete tile set may 
be found in Figure [3j 

Input Template. The input template, or seed, for the construction of an adder with a O(logn) average 
case is shown in Figure [4j This input template is composed of n blocks, each containing three tiles. Within a 
block, the easternmost tile is the S labeled tile followed by two tiles representing Ak and , the fcth bits of 
A and B respectively. Of these n blocks, the easternmost and westernmost blocks of the template assembly 
are unique. Instead of an S tile, the block furthest east has an LS'S-labeled tile which accompanies the 
tiles representing the least significant bits of A and B, Aq and Bq. The westernmost block of the template 
assembly contains a block labeled MSB instead of the S block and accompanies the most significant bits of 
A and B, A n _i and £ n _i. 
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Figure 4: Top: Output template displaying addition result C for O(logn) average case addition construction. 
Bottom: Input template composed of n blocks of three tiles each, representing n-bit addends A and B. 



Computing Carry Out Bits. For clarity, we demonstrate the mechanism of this adder TAC through an 
example by selecting two 4-bit binary numbers A and B such that the addends A4 and Bi encompass every 
possible addend combination. The input template for such an addition is shown in Figure [5^, where orange 
tiles represent bits of A and green tiles represent bits of B. Each block begins the computation in parallel 
at each S tile. After six parallel steps (Figure [5]D-g) , all carry out bits, represented by glues CO and CI, are 
determined for addend-pairs where both Ai and Bi are either both or both 1. For addend pairs Ai and B t 
where one addend is and one addend is 1, the carry out bit cannot be deduced until a carry out bit has 
been produced by the previous addend pair, Ai-i and Bi-\. By step seven, a carry bit has been presented 
to all addend pairs that are flanked on the east by an addend pair comprised of either both Os or both Is, 
or that are flanked on the east by the LSB start tile, since the carry in to this site is always (Figure [5|i) . 
For those addend pairs flanked on the east by a contiguous sequence of size j pairs consisting of one 1 and 
one 0, 2j parallel attachment steps must occur before a carry bit is presented to the pair. 



Computing the Sum. Once a carry out bit has been computed and carried into an addend pair Ai and 
Bi, two parallel tile addition steps are required to compute the sum of the addend pair (Figure pg-j) . 



4.2 Time Complexity. 

0(n) - worst case. We first show that this construction has a O(n) worst case run-time under the timing 



model presented in Section 2.3 Run-time. Consider a binary sequence T of length 2n representing two n-bit 
binary numbers A and B. Aq and Bq represent the least significant bits of A and B, respectively, and A n and 
B n represent the most significant bits of A and B, respectively. The formatting of T is such that 1^ = £?j/ 2 
if i is even, and Tj = A^ i _ 1 y 2 if i is odd. Sequence T is shown in Figure [7ji. 

T contains n addend-pairs, (Ai,Bi), which are ordered pairs consisting of the zth bit of A and the 
ith. bit of B. The four possible values for each addend-pair are shown in Figure [?]}, along with the carry 
bits they produce upon addition. In T, there exist sequences of various sizes up to k addend-pairs such 
that every addend-pair in the interval from (Ai, Bi) to (A i+ j_i, B i+ j_i) of a size j addend-pair sequence 
is (1,0) or (0,1). In the adder TAC outlined in Section [4] and Appendix 4.1 the value of the carry bit 



produced upon addition of Ai + Bi for every (0,0) and (1,1) addend-pair is known and made available 
to the next addend-pair (Aj+i, -Bj+i) after seven parallel tile addition steps, including the carry bit into 
(Aq,Bq). Therefore, after a constant number of parallel tile addition steps, the first addend-pair (Ai,Bi) in 
any j addend-pair-length sequence of (1,0) or (0, 1) addend-pairs will be presented with the carry bit from 
the previous addend-pair (A4-1, B^_x). After 2j subsequent parallel tile addition steps, the carry bit from 
the last addend-pair (Aj+j-i, -Bj +J -_i) of the j-size sequence of consecutive (0,1) and (1,0) addend-pairs is 
presented to (Ai + j, Bi + j). Once an addend-pair has recieved a carry-in bit, the final sum is computed in 
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Figure 5: Example tile assembly system to compute the sum of 1001 and 1010 using the adder TAC presented 
in Section p4j a) The template with the two 4-bit binary numbers to be summed, b) The first step is the 
parallel binding of tiles to the S, LSB, and MSB tiles. c)Tiles bind cooperatively to west-face S glues and 
glues representing bits of B. The purpose of this step is to propagate bit Bi closer to Ai so that in d) a tile 
may bind cooperatively, processing information from both and Bi. e) Note that addend-pairs consisting 
of either both Is or both 0s have a tile with a north face glue consisting of either (l,x) or (0,x) bound to 
the Ai tile. This glue represents a carry out of either 1 or from the addend-pair. In (e-g) the carry outs 
are propagated westward via tile additions for those addend-pairs where the carry out can be determined. 
Otherwise, spacer tiles bind, h) Tiles representing bits of C (the sum of A and B) begin to bind where a 
carry in is known, i-j) As carry bits propagate through sequences of (0,1) or (1,0) addend pairs, the final 
sum is computed. 
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Here we see that if the input values are equal, the carry that was just generated will be sent to the next block immediately. 
However, if the input values are not equal, the carry for next block will be propagated after the carry-in is accepted. 
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Figure 6: The set of figures on the left side show how a carry out may be generated before the addend-pair 
has recieved a carry in. The set of figures on the right side show how a carry out can be dependent upon a 
carry in having been recieved by the addend-pair. 
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Figure 7: a) Binary sequence T populated by n-bit binary numbers A and B. b) Four possible (Ai,Bi) 
addend-pair combinations. 



one tile addition step. If k is the longest contiguous sequence of (0, 1) and (1,0) addend-pairs, then 2k + 8 
parallel tile addition steps are required to compute the sum C of A and B (Figure [8]). Therefore, the time 
complexity is O(k). Since the growth is bounded upwards by the longest contiguous sequence k of (0, 1) and 
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Figure 8: 



(1, 0) addend-pairs, then the worst-case scenario occurs when k — n. Thus, the worst-case run time is 0{n). 



O(logn) - average case. We now show that the average case run-time is O(logn) under the timing model 
presented in Section 2.3 In two n-bit randomly generated binary numbers, A and B, the probability of one 
of the (1, 0) (0, 1) addend-pair cases occurring at (Ai, Bi) is 1/2. The sequence of (Ai, Bi) bit pairs can thus 
be thought of as a Bernoulli process in which the likelihood of occurrence of a (0, 1) or (1,0) addend-pair 
is equal to the occurrence of a (1, 1) or (0, 0) addend-pair. As shown above, the runtime is bounded by fc, 
the longest contiguous sequence of (0, 1) or (1, 0) addend-pairs, which might be thought of as the longest 
sequence of heads in n independent fair coin tosses. Using Lemma 4.1 the expected longest run of heads 
in n coin tosses is O(logn) [13]. Therefore, the average case time complexity of the tile addition algorithm 
described above is O(logn). 



4.3 Correctness. 
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Figure 9: The scaffolding needed for the computation. 

The analysis for the average case O(logn) worst case 0(n) adder TAC will begin from the point of the 
construction after which all scaffolding is in place (Figure [9|. 

The first step for any addend-pair in the seed is to perform its addition. This addition step will output 
either (1, af), (0,ir), or (x, ->x) according to the following formulas: 

l + l = (M) 

+ = (0,x) 

1 + = (x, -ix) 
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+ 1 = (x,->x) 



The first value in the output represents the carry-out of the addend-pair and the second value represents 
the value of the addition. In both cases, x refers to the unknown carry-in which will come from the previous 
addend-pair's carry out. 

Given any addend-pair, we can then determine whether or not we have enough information to generate 
a carry by looking at the first position of the generated output pair in the first step. The two cases that can 
generate a carry, (l,x) and (0, x), do just that and immediately propagate a 1 or respectively. The other 
case, (x, ->x), simply waits for a carry to come in. No addend-pair can calculate its value until it receives a 
carry from the previous addend-pair. Once an addend-pair receives a carry-in, it can replace any x or ^x 
with the proper value and it can "print" the correct value, i.e. the solution at that particular position. Also, 
if ko was x it can now propagate its carry to the next addend-pair. 



5 Optimal 0(y/n) Addition 

We show how to construct an adder TAC that achieves a run time of 0(y/n), which matches the lower 



bound proved in Theorem 3.4 This adder TAC closely resembles an electronic carry-select adder in that 



the addends are divided into sections of size ^Jn and the sum of the addends comprising each is computed 
for both possible carry-in values. The correct result for the subsection is then selected after a carry-out 
has been propagated from the previous subsection. Within each subsection, the addition scheme resembles 
a ripple-carry adder. This construction works well with massive parallelism and allows us to construct an 
optimal 0(s/n) adder TAC in two dimensions. 

Theorem 5.1. There exists a 2D n-bit adder TAC with a worst case run-time of 0(y/n). 



The proof of Theorem |5.1| follows from the construction of the tile assembly adder in Sections |5.1[ |5.2 
and 15.31 



5.1 Construction. 
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Figure 10: The complete tile set. a) Tiles involved in bit addition, b) Carry propagation tiles, c) Tiles 
involved in incrementation, d) Tiles that print the answer. 
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Figure 11: These are example I/O templates for the worst case O(-^n) time addition introduced in Section 

El 



Input /Output Template. Figure 11(a) and Figure 11(b) are examples of I/O templates for a 9-bit adder 
TAC. The inputs to the addition problem in this instance are two 9-bit binary numbers A and B with the 
least significant bit of A and B represented by Aq and Bq, respectively. The north facing glues in the form 
of Ai or Bi in the input template must either be a 1 or a depending on the value of the bit in Ai or Bi. 
The placement for these tiles is shown in Figure |ll(a) while a specific example of a possible input template 
is shown in Figure |12fc ,. The sum of A + B, C ', is a ten bit binary number where Co represents the least 
significant bit. The placement for the tiles representing the result of the addition is shown in Figure 11(b) 
while a specific example of an output is shown in Figure |14| . 

To construct an n-bit adder in the case that n is a perfect square, split the two n-bit numbers into ^/n 
sections each with ^fn bits. Place the bits for each of these two numbers as per the previous paragraph, 
except with ^fn bits per row, making sure to alternate between A and B bits. There will be the same amount 
of space between each row as seen in the example template |1 1(a) All Z, Nc, and F' , must be placed in the 
same relative locations. The solution, C, will be in the output template s.t. Ci will be three tile positions 
above Bi and a total of size n + 1. 

Below, we use the adder tile set to add two nine-bit numbers: A = 100110101 and B = 110101100 to 
demonstrate the three stages in which the adder tile system performs addition. 



Step One: Addition. With the inclusion of the seed assembly (Figure 12 1) to the tile set (Figure 10 1, 
the first subset of tiles able to bind are the addition tiles shown in Figure jlOji. These tiles sum each pair 
of bits from A and B (for example, Aq + Bq) (Figure 12 :>). Tiles shown in yellow are actively involved in 



adding and output the sum of each bit pair on the north face glue label. Yellow tiles also output a carry to 
the next more significant bit pair, if one is needed, as a west face glue label. Spacer tiles (white) output a 
B glue on the north face and serve only to propagate carry information from one set of A and B bits to the 
next. Each row computes this addition step independently and outputs a carry or no-carry west face glue 



on the westernmost tile of each row (Figure 12;). In a later step, this carry or no-carry information will be 
propagated northwards from the southernmost row in order to determine the sum. Note that immediately 
after the first addition tile is added to a row of the seed assembly, a second layer may form by the attachment 



of tiles from the increment tile set (Figure 10 |c. While these two layers may form nearly concurrently, we 



separate them in this example for clarity and instead address the formation of the second layer of tiles in 
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Step Two: Increment below. 
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Figure 12: Step 1: Addition 
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Step Two: Increment. As the addition tiles bind to the seed, tiles from the incrementation tile set 
(Figure 10;) may also begin to cooperatively attach. For clarity, we show their attachment following the 
completion of the addition layer. The purpose of the incrementation tiles is to determine the sum for each A 
and B bit pair in the event of a no-carry from the row below and in the event of a carry from the row below 



(Figure 13). The two possibilities for each bit pair are presented as north facing glues on yellow increment 
tiles. These north face glues are of the form (x, y) where x represents the value of the sum in the event 
of no-carry from the row below while y represents the value of the sum in the event of a carry from the 
row below. White incrementation tiles are used as spacers, with the sole purpose of passing along carry or 
no-carry information via their east /west face glues F', which represents a no-carry, and F, which represents 
a carry. 
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Figure 13: Step 2: Increment. 
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Step Three: Carry Propagation and Output. The final step of the addition mechanism presented 
here propagates carry or no-carry information northwards from the southernmost row of the assembly using 
tiles from the tile set in Figure [lOp and then outputs the answer using the tile set in Figure [Top 1 . Following 
completion of the incrementation layers, tiles may begin to grow up the west side of the assembly as shown in 
Figure [14K . When the tiles grow to a height such that the empty space above the increment row is presented 
with a carry or no-carry as in Figure |14b , the output tiles may begin to attach from west to east to print 
the answer (Figure 
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As the carry propagation column grows northwards and presents carry or no carry 
information to each empty space above each increment layer, the sum may be printed for each row Figures 
MH-e. When the carry propagation column reaches the top of the assembly, the most significant bit of the 



sum may be determined and the calculation is complete (Figure [Hp). 
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Figure 14: Step 3: Carry Propagation and Output. 



5.2 Time Complexity. 

Using the runtime model presented in Section |2.3| Run-time we will show that the addition algorithm pre- 
sented in this section has a worst case runtime of 0(^/n). In order to ease the analysis we will assume that 
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each logical step of the algorithm happens in a synchronized fashion even though parts of the algorithm are 
running concurrently. 

The first step of the algorithm is the addition of two 0(y/n) numbers co-located on the same row. This 
first step occurs by way of a linear growth starting from the leftmost bit all the way to the rightmost bit 
of the row. The growth of a line one tile at a time has a runtime on the order of the length of the line. In 
the case of our algorithm, the row is of size 0{yfn) and so the runtime for each row is 0(\fn). The addition 
of each of the y/n rows happens independently, in parallel, leading to a 0{y/n) runtime for all rows. Next, 
we increment each solution in each row of the addition step, keeping both the new and old values. As with 
the first step, each row can be completed independently in parallel by way of a linear growth across the row 
leading to a total runtime of 0(y/n) for this step. After we increment our current working solutions we must 
both generate and propagate the carries for each row. In our algorithm, this simply involves growing a line 
across the leftmost wall of the rows. The size of the wall is bounded by 0(y/n) and so this step takes 0(y / n) 
time. Finally, in order to output the result bits into their proper places we simply grow a line atop the line 
created by the increment step. This step has the same runtime properties as the addition and increment 
steps. Therefore, the output step has a runtime of 0(y/n) to output all rows. 

There are four steps each taking 0(y/n) time leading to a total runtime of 0(y / n) for this algorithm. 
This upper bound meets the lower bound presented in Corollary |3 . 5| and the algorithm is therefore optimal. 

Choice of y/ri rows of yfn size. The choice for dividing the bits up into a y/nx y/n grid is straightforward. 
Imagine that instead of using 0{yfn) bits per row, a much smaller growing function such as 0(\ogn) bits 
per row is used. Then, each row would finish in O(logn) time. After each row finishes, we would have to 
propagate the carries. The length of the west wall would no longer be bound by the slow growing function 
0(y/n) but would now be bound by the much faster growing function 0(j^ L ^). Therefore, there is a distinct 
trade off between the time necessary to add each row and the time necessary to propagate the carry with 
this scheme. The runtime of this algorithm can be viewed as the max(row S i ze , westwall S i ze ). The best way 
to minimize this function is to divide the rows such that we have the same number of rows as columns, i.e. 
the smallest partition into the smallest sets. The best way to partition the bits is therefore into \fn rows of 
y/n bits. 

5.3 Correctness. 

The first two steps of the algorithm are addition followed by incrementation. It is important to note that 
this incrementation step not only outputs both the original (addition result) and incremented value but also 
whether or not the original value contained a zero. This is important because it will allow us to later use 
this information to decide whether or not a row will contain a carry. Now, these two steps of the algorithm 
rely on nothing but the data in the current row and are completed in parallel across all rows. Therefore, 
with the given tile set in Figure [lOj these two steps are straightforward to verify. 

The next step is for every row to select a solution from the two that were generated as well as propagate 
its carry information. We will, for the moment, assume that some row has not received the information of the 
previous carry and will concentrate on this row. At this step we know if the current row generated a carry 
C € {T, F}, if the rows sum contains a zero Z € {T, F}, and two possible row values A and B. A represents 
the sum if the row receives a carry-in of and B the opposite. In order to continue from this step an answer 
must be selected and a carry or no-carry must be propagated. If Z — T, then we immediately know that 
we will propagate whatever C may be. If Z = F and C = T, we also know that we must propagate a carry. 
The only situation in which we do not know whether we will propagate a carry is when Z — F and C = F . 
When we encounter this situation we propagate whatever the carry was in the previous row. Also, in order 
to decide whether we will select A or B we need only the previous carry. Therefore, assuming we have the 
correct previous carry we can correctly select both the proper value for this row as well as propagate the 
correct carry to the next row. 

Finally, because we know the initial carry is correct (it is part of the seed), we know that the first row 
can select the correct result as well as propagate the correct carry. Then, because we know that the first 
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row's carry is correct, we know that the second row can select the correct result as well as propagate the 
carry. This chain continues until it reaches the last row leading to selecting all the correct values as well as 
propagating all the correct carries. 

The last step is to select the most significant bits value which is solely based on the last row's carry. If 
the last row propagates a carry it is a 1, otherwise it is a 0. Since we know that the last carry is correct we 
know that the last value is selected properly. 

6 Towards Faster Addition 

[I 1 MSB 




MSB III 



(a) (b) 



Figure 15: Arrows represent carry origination and direction of propagation for a) the O(logn) average case, 
the 0(y/n) worst case combined construction, and b) the 0(y/n) worst case construction. 



In this section we combine the approaches described in Sections [4] and [5] in order to achieve both O(logn) 
average case addition and (9(y / n) worst case addition. This construction resembles the construction described 
in Section [5] in that the numbers to be added are divided into sections and values are computed for both 
possible carry-in bit values. Additionally, the construction described here lowers the average case run time 
by utilizing the carry-skip mechanism described in Section [4] within each section and between sections. 

Theorem 6.1. There exists a 2-dimensional n-bit adder TAC with an average run-time of O(logn) and a 
worst case run-time of O(^fn). 

The proof follows from the construction of the adder in Sections |6.1[ |6.2| and |6.3| 



6.1 Construction 

This construction combines the two-dimensional scaffold principle of the simple worst-case addition construc- 
tion (Section [5]) with the principle that certain addend-pairs can compute a carry out before they get a carry 



in, which was shown to reduce the average case run-time to 0(log n) in Section 4.2 Contrary to the addition 
construction in Section |5.1[ the directionality of adjacent rows is antiparallel in the construction described 
here. Every odd row beginning with row one, which is the southernmost row, has the least significant bit on 
the east and the most significant bit on the west. Every even row has bits in the opposite order, as shown in 
Figure |18fc t. Each odd row, along with the even row above, should be considered as a single section in which 
the addition mechanism is nearly identical to the O(logn) average case adder TAC. Within each section, 
carry outs are propagated east to west on the odd row, up in constant time from the most significant bit on 
the odd row (OMSB) to the least significant bit on the even row (ELSB), and from west to east on the even 
row. Adjacent rows are anti-parallel so that the distance between the most significant bit (MSB) of each 
section is a constant distance from the least significant bit (LSB) of the section above. This modification 
allows us to apply the O(logn) average case addition between each pair of bits, and at the same time apply 
the carry propagation mechanism of the 0(y/n) worst case addition construction between the MSB of each 
even row. 
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Figure 16: Partial set of tiles necessary to implement 0(log n) average case, y/n worst case combined addition 
(see also next figure). 
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Figure 17: Tiles necessary to implement O(logn) average case, \/n worst case combined addition (continued 
from previous figure). 
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Figure 18: a) The input template for addition of two n-bit binary numbers A and B. b)The output template 
for the addition of two n-bit binary numbers A and B. 



Carry Passing and Prediction of Results Within Sections. Within each section, or odd/even row 
pair, carry outs are propagated as stated in the above paragraph (Figure 19 1). In addition, each row 



will present "predicted" results, that is, the result of the computation as if there were no carry from the 
MSB of the previous section. These results are shown as yellow tiles in Figure [lUp-c. Note the blue tiles 
in Figure 19 i. These blue tiles represent the final sum of the addend pairs. They may be computed 
without carry information propagated from a section below because the addend pairs are either l)located 
in the southernmost section, where it is immediately known that the carry-in is or 2)are flanked by a 
less significant addend pair for which the carry out is immediately known. The north face glues on yellow 
tiles in Figure 19 i which contain a star * rely on a carry from a previous section. The mechanism of carry 



propagation within a section can be seen in Figure 20 i-b. Vertical columns grow up along the west side of 
each section to propagate the carry from the odd row to the even row. 



Carry Propagation Between Sections. Figure [20p shows a vertical column growing between the south- 
ernmost section and the section above. This column is propagating a carry from the most significant bit 
of the southernmost section A to the least significant bit of the section B to the north. This information 
continues to the most significant bit of section B at which point a carry bit is computed to propagate to the 
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section north of B (Figure 20 i, Figure 21 1). The terminal tile assembly with the computed sum is presented 
in Figure 21 d. 



6.2 Time Complexity. 

0(y/n) Worst Case and 0(log n) Average Case Run-Time The construction presented in this section 

represents a combination of the constructions presented in Sections [4] and [5j Thus, this runtime analysis 

combines elements from the time complexity proofs in |4.2| and |5.2| In the construction described here, each 

section on the scaffold precomputes the sum as if there were a carry from the previous section and also as 

if there were no carry from the previous section. In some cases, whether a carry bit is passed in from the 

section below is irrelevant and parts of the final sum may be generated before this information is available. 

If the most significant addend-pair bits (MSB) of a given section can immediately generate a carry out for 

propagation to the next section, they do so. When a carry in arrives from the previous section, final values 

are selected from the precompution step, if necessary. In this analysis of time complexity, we first analyze 

the run time of the precomputataion step, and then consider the run time of propagating carry bits from 

section to section up the scaffold. 

Con side r the binary sequence T of length 2n composed of n-bit binary numbers A and B as defined in 

Section 4.2 For this construction, divide T into ^ smaller sequences, or sections, So, Si, ■ ■ ■ ,S^/s , , with 
| • t — 1 

each section containing 2y/n addend-pairs. These sections are arranged on a scaffold as depicted in Figure 
[22] Note that the distance between the bottom half and the top half of a given section is constant and 
requires a constant number of steps to traverse. We first treat each section, Si, as an independent addition 
problem without regard for a carry in from the previous section. Define k as the longest contiguous sequence 



of (0, 1) and (0, 1) addend-pairs in Si. It follows from the proof in Section 4.2 that the run-time for Si is 
bounded upwards by the length of k. The worst-case runtime for addition over the sequence Si would thus 
occur when k = Therefore, the worst case run time for addition within a section is 0{^/n). It also 



follows, as described in Section 4.2 that the expected length of k is 0(log n). Therefore, the average run time 
for each addition within each section is O(logn). The precomputation within each of the ^ sections occurs 
independently in parallel, leading to a worst case 0{y/n), average case O(logn) precomputation run-time 
for all sections. 

After performing this precomputation within each section, we must propagate carries between each 
section. The distance between two neighboring sections is constant and may be traversed by a column of 
tiles in a constant number of steps. Therefore, a carry out bit, once generated, can be propagated from one 
section to the next in constant time. A carry out bit may be propagated to the next section, Si+i, immediately 
if the most significant addend pair of section Si consists of (0,0) or (0, 1). Otherwise, this most significant 
addend pair of Si must wait for a carry in before generating a carry out. Therefore, the composition of the 
most significant addend-pairs of the sections acts as a limiting factor to the speed with which carries may 
be propagated across all of the sections. Consider the binary sequence P which is comprised of the most 
significant addend pairs of each section and has a length of ^ addend-pairs. Let k be the longest contiguous 
sequence of (1, 0) and (0, 1) addend-pairs in P. The propagation of carry bits through P is bounded upwards 
by the length of k, with the worst-case being when k = ^ and an average case being when k = O(logn). 
Thus, the propagation of carry bits between each section up to the most significant addend-pair of T is 
bounded upwards by 0(y/n) and has an average run-time of O(logn). Therefore, this addition algorithm 
has an upper bound of 0(y/n) and an average run-time of O(logn). 

6.3 Correctness. 



Every row with north facing glues performs its addition in the exact same way as described in Section 4.1 
assuming an incoming carry of 0. As such we use the proof of correctness from Section [4~3| to show that this 
step is correct. Also, every row with south facing glues performs this same addition except with an incoming 
carry from the addition of the lower north facing glue side. Thus, the same proof also applies to this side. 
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Figure 19: a) Carry out bits are computed for each addend pair where a carry out can be deduced immedi- 
ately, b-c) "Predicted" results (yellow tiles) begin to attach, d) Blue tiles represent final result. 
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Figure 20: a-b) Carry bits are propagated within sections 
even row using vertical columns, c-d) Carry bits are propag; 
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Figure 21: a) A carry bit from the first section propagates through the middle section in this assembly, b) 
The terminal assembly displaying the output C. 
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Figure 22: Top: T is separated into sections. Bottom: Each section of T is arranged on a scaffold. Arrows 
represent the general direction of carry bit propagation. MSB denotes the most significant bit of a section, 
and LSB denotes the least significant bit of a section. 
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Considering both of these additions as the first step, we can say that the first step correctly adds a section 
with a carry-in of 0. 

Since the first section has no sections before it, the addition of the first section has an input carry of 
allowing the copying of its value up to the final "display" row. Once an addend-pair pair knows its carry-in 
it can display its results. This combined with the fact that we proved the addition of any section with an 
input carry of proves the correctness for the first section. 

Once a section finishes the first step, there are three possible carry-outs from the MSB addend-pair in the 
section: 0, 1, and 0*. If the section's MSB addend-pair propagates a or 1 then somewhere in the section 
some addend-pair generated its carry without depending on a carry from the previous section leading to a 
correct propagation. If the section's MSB addend-pair propagates a 0*, then no addend-pair in the section 
was able to generate a carry, meaning that no addend-pair in the section contained equal bits. Therefore, 
the section will correctly propagate whatever carry it received to the next section. These two clauses cover 
all cases in terms of carry propagation from one section to next section. 

Every section other than the first one performs its addition with an input carry of 0* which indicates 
an unknown carry with a value of 0. In other words, it continues the calculation assuming a input carry 
but cannot copy any undetermined values, due to the unknown carry, into the "display" row until it receives 
the real carry from the previous section. This carry only gets propagated until it reaches an addend-pair 
with equal bits because at this point the addend-pair would have already generated its carry regardless of 
any incoming input carry and propagated it. One can see that the beginning of any section, other than 
the first, is essentially calculated by the algorithm as if it was in the center of some series of addend-pairs 
with unequal bits. When the correct carry propagates from the previous section to the current section, the 
correct values may then be copied into the "display" row. If a is carried in, then it is a simple copy up of 
the value into the "display" row. If a 1 is carried in, then it is the inverse of what was previously calculated. 
Assuming some section receives the correct carry from the previous section, that section will "display" the 
correct result. We have shown that the first section propagates a carry correctly, and therefore all subsequent 
sections propagate their correct carry out. We have also shown that if each section propagates the correct 
carry-out to the next section, then the addition of addend-pairs within each section is performed correctly, 
producing the correct sum of A and B, C . 



7 Simulation 

Tile self-assembly software simulations were conducted to visualize the diverse approaches to fast arithmetic 
presented in this paper, as well as to compare them to previous work. The adder tile constructions described 
in Sections 4|6 and the previous best [4] were simulated using the two timing models described in Sections 

El 

Figure [23] demonstrates the power of the approaches described in this paper compared to previous work. 



8 An Extension Towards 3-Dimensions 

In this section we extend our construction into the third dimension to achieve a O(yfn) upper bound, which 



meets the Q(-fyn) lower bound from Theorem 3.4 Due to space and time constraints a general overview is 
given. We begin by creating ^/n total ^fn X yn constructions exactly as per Section [6] stacked one atop the 
other in an alternating fashion such that every odd plate, beginning with the first, has its MSB in the NW 
corner while the even plate has its LSB in that same corner. We continue by applying the same addition 
algorithm that was presented in Section [6] to all plates where every lower plate passes its carry-out to the 
upper plate. This is very similar to how carry out bits are passed between sections in the construction 
described in Section [6] Finally, every lower plate will pass the appropriate carry to the next lower plate. 
Please see Figure [8] for a visual overview of this process. 

Theorem 8.1. There exists a 3-dimensional n-bit adder TAC with an average run-time of O(logn) and a 
worst case run-time of O(y^ri). 
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Figure 24: An overview of the process to add two number in optimal time in 3D. 

We present a high-level sketch of the proof here. Similar with the O(logn) average case 0(y/n) worst 
case combined addition presented in Section [6j the binary numbers to be added are separated into sections 
each with length 0{yfn). We arrange these numbers on yfn scafolds of size yfn x ^pn 2D. The dashed lines 
in Figure [8] are carries transmitted between each plate in the third dimension. The lines on the north are 
carries transmitted from each odd plate to its next even plate. The dashed lines on the southernmost part 
of Figure [8] are the carries transmitted from the first grouped plates to the last plate. Since the numbers 
are all fit compactly with only a constant amount space between each plate and a constant amount of space 
between each section, any one side of the cube is at most O(^fn). Therefore, this size constraint along with 
the algorithms previously presented allow us to have an optimal 0(\/n) time complexity with an average 
case complexity of O(logn). 



9 Future Work 

The results of this paper provide numerous directions for future work. One interesting open problem is 
whether it is possible to achieve n-bit multiplication in sublinear time, and if possible, how close can we get 
to the known lower bound of f2(-y/n) in dimension d. We conjecture that sublinear multiplication is possible, 
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especially if 3D systems are considered. If restricted to two dimensions, either a sublinear result or a proof 
of a Q,(n) lower bound would be very interesting. 

Another direction is the study of the difference between the parallel run time model and the probabilistic 
run time model. With tile sets of constant size, can it be shown that probabilistic run time is at most a 
logarithmic factor slower in the size of the seed assembly? Can a better bound be proven? What connection 
can be made for non-constant sized tile sets? Some of these questions have been considered in [3] for systems 
seeded with a single tile. 

A final direction focusses on the consideration of non-deterministic tile assembly systems to improve 
expected run times even for maniacally designed worst case input strings. Is it possible to achieve O(logn) 
expected run time for the addition problem regardless of the input bits? If not, are there other problems 
for which there is a provable gap in achievable assembly time between deterministic and non-deterministic 
systems? 
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